
	**This file creates sample restrictions with Census population data**	

			**	**	**	**	**	**	**	**	**	**	**	**	**
			**	**	**	**	**	**	**	**	**	**	**	**	**
			**	**	**	**	**	**	**	**	**	**	**	**	**
	


*-------------------------------------------------------------------------------
* Start with census county data
*-------------------------------------------------------------------------------

insheet using "DEC_10_SF1_QTP3_with_ann.csv", names clear

**
*Make first row variable names
**

**make first row variable names
foreach var of varlist * {
 capture rename `var' `=strtoname(`var'[1])'
}

**rename hispanic variable
rename hd01_s30 numberhispanic
rename Number__RACE___Total_population total

*drop first row
drop if _n == 1

**prune variables
keep Id Id2 Geography total numberhispanic

*drop rural-urban breakdown
drop if _n > 3143

**save temporarily
save censustemp, replace

*-------------------------------------------------------------------------------
* Use other census data file to create cross walk for counties
*-------------------------------------------------------------------------------

insheet using "co-est2018-alldata.csv", names clear

*drop state totals
drop if stname == ctyname

*make Id
gen Id2 = (state*1000) + county
tostring Id2, replace

*-------------------------------------------------------------------------------
* merge in censustemp
*-------------------------------------------------------------------------------
merge 1:1 Id2 using censustemp

*fix DC

replace stname = "district of columbia" if Id == "0500000US11001"
replace ctyname = "district of columbia" if Id == "0500000US11001"
replace census2010pop = 601723 if Id == "0500000US11001"


*-------------------------------------------------------------------------------
* remove the few small counties
*-------------------------------------------------------------------------------
drop if _m < 3 & Id != "0500000US11001"
drop _m

*

**remove unneeded variables
keep state stname ctyname census2010pop internationalmig2010 npopchg_2010 ///
naturalinc2010 total numberhispanic Id*

*rename
rename state stateid
rename stname state
rename ctyname county

*lower case
foreach var of varlist state county {
replace `var' = lower(`var')
}

*rename "county" from county variable
replace county = subinstr(county," county", "", .)
replace county = subinstr(county," parish", "", .)

*destring number hispanic
destring numberhispanic, replace

*fix county name
replace county = "dona ana" if Id2 == "35013"


*-------------------------------------------------------------------------------
* Let's look at the hispanic population distribution
*-------------------------------------------------------------------------------

*distribution: more than 17896 marks top ten percent
su numberhispanic , detail


***
*Collapse New York Counties and Richmond City and County
***

replace county = "new york" if state == "new york" & county == "kings"
replace county = "new york" if state == "new york" & county == "bronx"
replace county = "new york" if state == "new york" & county == "richmond"
replace county = "new york" if state == "new york" & county == "queens"
replace county = "richmond city" if state == "virginia" & county == "richmond"

replace Id2 = "1" if county == "new york"
replace Id2 = "51760" if county == "richmond city"
collapse (sum) census2010pop-internationalmig2010 numberhispanic , by(state county Id2)


**erase temp file and save

save censusdata, replace


















